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A METHOD FOR RECOVERING 3D SCENE STRUCTURE AND CAMERA 



MOTION FROM POINTS, LINES AND/OR DIRECTLY FROM THE IMAGE 



INTENSITIES 



5 RELATED APPLICATIONS 

The present application claims priority of provisional application 60/210,099 filed on June 7, 
2000. 

The present application is related to U.S. application Serial No. ,titled A Method for 

Recovering 3D Scene Structure and Camera Motion Directly from Image Intensities, filed on 
10 , by the same inventor as the present application, which related application is 

incorporated herein by reference. 
BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention is directed to a method for recovering 3D structure and camera motion, 
15 and more particularly to a linear algorithm for recovering the structure and motion data from 
points, lines, and/or directly from the image intensities. 

2. Prior Art 

The science of rendering a 3D model from information derived from a 2D image predates 
computer graphics, having its roots in the fields of photogrammetry and computer vision. 

20 

Photogrammetry is based on the basic idea that when a picture is taken, the 3D world is projected 
in perspective onto a flat 2D image plane. As a result, a feature in the 2D image seen at a 
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particular point actually lies along a particular ray beginning at the camera and extending out to 
infinity. By viewing the same feature in two different photographs the actual location can be 
resolved by constraining the feature to lie on the intersection of two rays. This process is known 
as triangulation. Using this process, any point seen in at least two images can be located in 3D. 
5 It is also possible to solve for unknown camera positions as well with a sufficient number of 
points. The techniques of photgrammetry and triangulation were used in such applications as 
creating topographic maps from aerial images. However the photogrammetry process is time 
intensive and inefficient. 

10 Computer vision techniques include recovering 3D scene structure from stereo images, where 
correspondence between the two images is established automatically from two images via an 
iterative algorithm, which searches for matches between points in order to reconstruct a 3D 
scene. It is also possible to solve for the camera position and motion using 3D scene structure 
from stereo images. 

15 

Current computer techniques are focused on motion-based reconstruction and are a natural 
application of computer technology to the problem of inferring 3D structure (geometry) from 2D 
images. This is known as Stracture-from-Motion. Structure from motion (SFM), the problem of 
reconstructing an unknown 3D scene from multiple 2D images, is one of the most studied 
20 problems in computer vision. 
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Most SFM algorithms that are currently known reconstruct the scene from previously computed 
feature correspondences, usually tracked points. Other algorithms are direct methods that 
reconstruct from the images intensities without a separate stage of correspondence computation. 
Previous direct methods were limited to a small number of images, required strong assumptions 
5 about the scene, usually planarity or employed iterative optimization and required a starting 
estimate. 

These approaches have complementary advantages and disadvantages. Usually some fraction 
of the image data is of such low quality that it cannot be used to determine correspondence. 
Feature-based method address this problem by pre-selecting a few distinctive point or line 
features that are relatively easy to track, while direct methods attempt to compensate for the 
low quality of some of the data by exploiting the redundancy of the total data. Feature-based 
methods have the advantage that their input data is relatively reliable, but they neglect most 
of the available image information and only give sparse reconstructions of the 3D scene. 
Direct methods have the potential to give dense and accurate 3D reconstructions, due to their 
input data's redundancy, but they can be unduly affected by large errors in a fraction of the 
data. 

A method based on tracked lines is described in "A Linear Algorithm for Point and Line 
20 Based Structure from Motion", M. Spetsakis, CVGIP 56:2 230-241, 1992 , where the 

original linear algorithm for 13 lines in 3 images was presented. An optimization approach is 
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disclosed in C.J. Taylor, D. Kriegmann, "Structure and Motion from Line Segements in 
Multiple Images, "PAMI 17:11 1021-1032, 1995. Additionally, in "A unified factorization 
algorithm for points, line segments and planes with uncertainty models" K. Morris and I. 
Kanade, ICCV 696-702, 1998, describes work on lines in an affine framework. A projective 
5 method for lines and points is described in "Factorization methods for projective structure 
and motion", B. Triggs, CVPR 845-851, 1996, which involves computing the projective 
depths from a small number of frames. "In Defense of the Eight-Point Algorithm: PAMI 19, 
580-593, 1995, Hartley presented a full perspective approach that reconstructs from points 
and lines tracked over three images. 

10 

The approach described in M. Irani, "Multi-Frame Optical Flow Estimation using Subspace 
Constraints," ICCV 626-633, 1999 reconstructs directly from the image intensities. The 
essential step of Irani for recovering correspondence is a multi-frame generalization of the 
optical-flow approach described in B. Lucas and T. Kanade, "An Iterative Image Registration 

15 Technique with an Application to Stereo Vision", IJCAI 674-679, 1981, which relies on a 
smoothness constraint and not on the rigidity constraint . Irani uses the factorization of D 
simply to fill out the entries of D that could not be computed initially. 

Irani writes the brightness constancy equation (7) in matrix form as A = -DI , where D 
tabulates the shifts d 1 and I contains the intensity gradients V / (p n ). Irani notes that D has 

20 rank 6 (for a camera with known calibration), which implies that A must have rank 6. To 
reduce the effects of noise, Irani projects the observed A onto one of rank 6. Irani then 
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applies a multi-image form of the Lucas-Kanade approach to recovering optical flow which 
yields a matrix equation DI 2 = -A 2 , where the entries of I 2 are squared intensity gradients I a I b 
summed over the "smoothing" windows, and the entries of A 2 have the form I a AL Due to 
the added Lucas-Kanade smoothing constraint, the shifts D or d ! n can be computed as D = - A 2 
5 [I 2 ] + denotes the pseudo-inverse, except in smoothing windows where the image intensity is 
constant in at least one direction. Using the rank constraint on D, Irani determines additional 
entries of D for the windows where the intensity is constant in one direction. 

SUMMARY OF THE INVENTION 

10 

The present invention is directed to a method for recovering 3D scene structure and 
camera motion from image data obtained from a multi-image sequence, wherein a reference 
image of the sequence is taken by a camera at a reference perspective and one or more 
successive images of the sequence are taken at one or more successive different perspectives 
15 by translating and/or rotating the camera. The method comprising the steps of: 

(a) determining image data shifts for each successive image with respect to the 
reference image; the shifts being derived from the camera translation and/or rotation from the 
reference perspective to the successive different perspectives; 

(b) constructing a shift data matrix that incorporates the image data shifts for each 

20 image; 
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(c) calculating two rank-3 factor matrices from the shift data matrix using SVD, one 
rank-3 factor matrix corresponding the 3D structure and the other rank-3 factor matrix 
corresponding the camera motion; 

(d) recovering the 3D structure from the 3D structure matrix using SVD by solving a 
5 linear equation; and 

(e) recovering the camera motion from the camera motion matrix using the recovered 
3D structure. 

The method of the invention is a general motion algorithm wherein the camera positions for 
each successive perspective do not lie on a single plane. The method can reconstruct the 
10 image from points, lines or intensities. 

The present invention is an essentially linear algorithm that combines feature-based 
reconstruction and direct methods. It can use feature correspondences, if these are available, 
and/or the image intensities directly. Unlike optical-flow approaches such as that described 
15 in "Determining Optical Flow", B.K.P Horn and B.G. Schunck, AI 17, 185-203, 1981, the 
algorithm exploits the rigidity constraint and needs no smoothing constraint in principle 
assuming the motion is small and the brightness constancy equation is valid. However, in 
practice, it is sometimes necessary to impose smoothness in the algorithm. 

20 The method of the present invention can reconstruct from both point and line features. This 
is an important aspect of the present invention, since straight lines are abundant, especially in 
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human-made environments, and it is often possible to track both feature types. Lines can be 
localized very accurately due to the possibility of finding many sample points on a single 
line, and they do not suffer from some of the problems of point tracking such as spurious T- 
junctions and the aperture effect. The method described here is the first that can reconstruct 
5 from all three types of input data in any combination or separately over a sequence of 
arbitrarily many images under full perspective. 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 These and other features, aspects, and advantages of the methods of the present invention will 
become better understood with regard to the following description, appended claims, and 
accompanying drawings where: 

FIG, 1 schematically illustrates a hardware implementation of the present invention. 

15 

FIG. 2 is a block diagram that illustrates the method of the present invention. 
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

20 Definitions 

For purposes of this disclosure, the invention will be described in accordance with the 
following terms defined herein. The method of the present invention assumes that the 
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calibration is known and takes the focal length to be 1 (wlog). Modifications to the method 
for uncalibrated sequences will be discussed later. 

Assuming a sequence of N i images. Choose the zeroth image as the reference image. LetR 1 , 
T 1 denote the rotation and translation between this image and image /. Parameterize small 

6) = \0) 1 6) X C)\ [t^I = (t'T'Y 

rotations by the rotational velocity ^ * ? y9 z} . Let 1 h v x y) . 

Let a 3D point P transform as ^ " ^ . Assume N p points and N L lines tracked over the 
sequence. For clarity, we use a different notation for the tracked points than for the pixel 

positions in the intensity images. Let q l m s [q x ,q y ) denote the m-th tracked point and 
A\ = (a 9 p 9 y) l denote the t _ th line in the /4h i maget Let $ ~ . Let R*q denote the 



. Define the 



image point obtained from q after a rotation: 
three point rotational flows ^(^^(^j),^*,;*'), by 

L -9m denotes the 3D unit vector 

ql\q\- fclf/MI sim . larlyj A- AI\A\ Let deno te the image 

displacement for the m-th tracked point andg m = (Q x ,Q y ,Q 2 Y denote the 3D scene point 

corresponding to the tracked point q m . Then parameterize a 3D line L by two places that it 
lies in. The first plane, described by A, passes through the center of the projection and the 
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imaged line in the reference image. The normal to the second plane, B 9 is defined by 



requiring B -A = 0 and B • Q = -1 for any point Q on L. 



Let 



p n = (jc b .y n ) T be the image coordinates of the n-the pixel position. Order the pixels from 



1 to N x 9 the total number of pixels. Let/' denote the i-the intensity image and let 
V n = V (p n ) denote the image intensity at the n-the pixel position in V . Let P n denote the 
3D point imaged at p n in the reference image, with P n = {X n ,X n ,Xnf in the coordinate 
5 system of 7° . Let d l n denote the shift in image position from 7° to / of the 3D point P n . 

Suppose V a is a set of quantities indexed by the integer a. We use the notation fr} to 
denote the vector with elements given by the V a . 

10 Algorithm Description-Preliminary Processing 

The method of the present invention requires that the translational motion not be to large, 

(e.g., with \T I Z min | < 1 / 3 and that the camera positions do not lie in a plane. 

Given tracked points and lines, we approximately recover the rotations between the reference 
image and each following image by minimizing: 



15 




0) 



m 



with respect to the rotations R l 9 where one should adjust the constant according to the 



relative accuracy of the point and line measurements. Minimizing equation (1) gives 
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accurate rotation as long as the translations are not too large. After recovering the rotations, 

compensation is made for them and henceforth they can be treated as small. 

Points 

Point displacements are calculated in accordance with J. Oliensis, "A Multi-Frame Structure 
from Motion Algorithm under Perspective Projection", IJCV 34:2/3 163-192, 1999, 
incorporated herein by reference, the point displacements are 



l-Q~ l T 



■ + 



f (r 1 , q l m ) where the rotational flow 



9 Qm ) = Qm ~ ) 1 * Qm - Under the assumptions, we can approximate 
« Gi kX - V 1 \ )+ ) + <r (2) fe» ) + ^r% m ) where the last three terms give 

10 the first-order rotational flow. Correspondingly, define the three length-2^ translational 
flow vectors 



O =- 



<X> = ■ 



.*)}. 

{0}' 

ke; 1 }" 
fee;'} 
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and the length -2N rotational flows 



* = 



{-?(«)}. 



5 Then collect the shifts, 5' into a iV. x 2N matrix 5, where each row corresponds to a 



different image / and equals [{^ f {s* y J ] . Then 

s * fcX +{tM +{tM . 



(2) 



Lines 



The line measurement model is described as follows. One can reasonably assume that the 
noise of a measured line is proportional to the integrated image distance between the 
15 measured and true positions of the line. Let @ FOV denote the field of view (FOV) in radians. 
Typically, @ FOV < 1 . If the measured line differs from the true one by a z-rotation, this gives 
a noise of o((© FOV / 2) 2 ). A rotation around an axis in the x-y plane gives a noise of 
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0(© FOV ), which is typically larger. (These estimates reflect the fact that the displacement 
region bounded by the true and measured lines look like two triangles in the first case and a 
quadrilateral in the second. For the representation described herein, this implies that line 



measurements determine A z more accurately thm A x ,A y by a factor roughly of ^ 



FOV 



10 



The Image flow for lines can be described as follows. Let A = (A;0) 
and B = (B;-l) be the homogeneous length-4 vectors corresponding to the plane normals A 
and B for the line L. Let Q = (Q;l) be the homogeneous vector corresponding to a point on 
L. Then A-B = A-Q = B-Q = Q. After rotation R and translation T, we determine the 
transformed A'B' from the requirement that A r • Q - B f ♦ Q = 0 . We can satisfy this 



requirement using A* - 



RA 



RB 

T - B + B A 



. But A* doesn't necessarily have A f 4 = 0 , 



as the A' for the new image of the line must. So we set A = £ ~ A\B* I B\ , which implies 



that the new image line is given by A r = R 



A- B 



T * A 



T -B + B 



(3) 



4 J 



Now consider the line flow SA} = 4 - A® . Define SA R = A f - R 1 A / and 

(r-A). 



15 SAr~R- l A'-A = 



1-T-B 



B so 8A = 8A T +5A R . For small rotations and translations, with 



T-B «1, 



M *(T-A)B + <yxA. 



(4) 
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B * A = 0 implies that SA • A « 0 . To eliminate the scale ambiguity in defining each A, we 



take 



= 1 in the reference image and require that the line flow satisfies 



0 = SA\ • Af = {a!j -Af)- A° exactly. We normalize all A 0 to the same magnitude in the 
reference image to reflect the fact that the measurement errors should be similar for all lines. 
5 The requirement that 0 = SA) • A° "fixes the gauge" in a way that is consistent with our line 
representation and small-motion assumption. 

After "gauge fixing" each SA] incorporates just two degrees of freedom. We represent SA) 
by its projection along two directions, A, x (z x A l ) and zxA n which we refer to respectively 
as the upper and lower directions. Let the unit 3-vector and P{ project onto these two 
10 directions. For typical © F0V < l 9 \A { • z| « 1 and the upper component of A l roughly equals 
Aj z . Thus, image measurements determine the upper component more accurately than the 
lower, by roughly 1 / ® FOV . In analogy to the point definitions, define the line translation 



flows 



where these are length- 2 N L vectors. B v = B*P U and B L =B*P L are the upper and lower 
components of B. Similarly, define the rotational flows 
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(5) 



Let A be theiV; x 2N L matrix where each row corresponds to a different image / and equals 



K-^A'f {Py-JA'f . Then 



(6) 



Intensities 

One can apply the same arguments to the 5 the image shifts corresponding to the 3D point 
10 imaged at the pixel position p n , as to the s' m for the tracked feature points. Thus 

< ^^(pX-lT'lJ+^^J+^tpJ+^fcJ. Let V/„ -V/frJ-fc,/^ 

represent the gradient of the image intensity for 1° 9 with some appropriate definition for a 
discrete grid of pixels. Let M l n define the change in intensity with respect to the reference 
image. Its simplest definition is M l n = V n -f n . The brightness constancy equation is 
15 A/;+V7^d^0. (7) 
This and the previous equation imply that 

-All =V/„-dl -Z; 1 (v/..pX'-V/>'l)+V/ B .(^ + ©^ ) +^). Define the three 
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length -N x translational flow vectors 0 /x ^ -{z' 1 1 x },0 Iy = -{z~ l I y \® h = {z _1 (W-p)}, and 
the length - N x rotational flows, ^ = -{vi ■ r (l) (p)}, V ly = -{vi ■ r (2) (p)}, ¥ b = -{vi ■ r (3) (p)} . 

Let A be a N { x N x matrix, where each row corresponds to an image i and equals {AT f . 

Then A * {T x K + {?; K + & + K W + K K + K W • (8) 

5 

Reconstruction 

The reconstruction of the 3D scene is described as follows. Define the total rotational flow 
vectors for points, lines and image intensities by = [y^Y^^ ], and similarly for the 
y and z components. Here co h and 6) x are constant weights that the user should set according 

10 to the relative noisiness of the point, line, and intensity data. The x ¥ a have length 

2N + 2N L +N X = N toL . One can verify from the definitions that the are computable 
from measured quantities and can be taken as known. Using Householder matrices, one can 
compute a (N tot -3)x N tot matrix H annihilating the three ^ x y z . Computing and 
multiplying by H cost O ( N tot ) computation. Define the N [ x N tot matrix 

15 D = [S o) L A o) h A] . Let C be a constant (N I - 1) x (Nj - 1) matrix with C ir = S w + 1 , we 
include C to counter the bias due to singling out the reference image for special treatment. 



Define D = C ^DH T . Define the total translational flow vectors by <X> X = 



, and 
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similarly for y and z. From above, equations (2), (6) and (8) imply that 

d ch * c"^ {t x }Wr t + c"^ {r y jo Jh t + c"^ {r z }o Jh t . (9) 

D CH is approximately rank 3. 

Our basic algorithm of the method of the present invention is, 
5 1) Define H and compute D CH . Using the singular value decomposition (SVD), 

compute the best rank-3 factorization of D CH « M®S® T where are rank 3 

matrices corresponding respectively to the motion and structure. 

2) S {3) satisfies, [o, O y oJ= H T S {3) U + ft x T v (10) 

where U and Q are unknown 3x3 matrices. We eliminate the unknowns Q z9 B z and Z 1 
10 from the <f) a in this system of equations to get 3N tot linear constraints on the 18 unknowns in 
U and Q . We solve these constraints with 0(N tot ) computations using the SVD. 

3) Given U and £1 , we recover the structure unknowns Q z , B z and Z" 1 from 

[o x ®^ <J> J = H T S®U + fpj W y ¥ z ]q where Q is the 3d coordinate for an image pixel in the 

reference image, B is the shortest vector from the camera center to a 3D line and Z is the 
15 depth from the camera to a 3D scene alone the cameras optical axis. 

4) Given U , we use S {3) U * <P V O z ] and 

D OT * C"^ }o r H T + C"^ {r v jo ^H T + C"^ {r z }o JH T to recover the translations. 

5) We recover the rotations a) l x , co 1 , co l z from 
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'X +KXn =c~ K Di -{c~ %> r +{r y }o T y +{T z }® T z )j 



n 



The description above omits some steps, which are important for bias correction. 1) The 



the greater accuracy in the measurement of these components. 2) To correct for bias due to 
5 the fact that the FOV is typically small, in Steps 2-4 we weight T z by a factor proportional to 
the FOV and weight Q? z by the inverse of this, while leaving the x andy 
Components untouched. 3) The steps of the algorithm can be iterated to give better results 
and correct for the small motion assumptions. This involves multiplying the original feature 
point shifts by 1 - Z~ X T Z and the line shifts by 1 - B • T . The algorithm is guaranteed to 
10 converge to the correct reconstruction if the motion and noise are small and the camera 
positions do not lie on a plane. 

Of course, the results for the intensity-based part of our algorithm depend crucially on the 
technique used for computing derivatives. To make this more robust, one should iteratively 
15 reduce the size of the displacements d[ by warping and recomputing the reconstruction in a 
coarse-to-fine mode. One implementation simply computes the image derivatives at a single 
scale, using consistency between the spatial derivatives computed for different images to 
determine whether the assumption of small image motion holds for this current scale. Only 
those pixel sites where the assumption does hold are used for the motion computation, 
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We have sometimes found it useful to preprocess the image intensities prior to running the 
algorithm. We first compute a Laplacian image and then transform the intensities by a 
stigma function to enhance edgy regions and suppress textureless ones. This gives the 
intensity image a form that is intermediate between the unprocessed images and a selected set 
of tracked features. 

Bas-relief for Lines 

Due to the bas-relief ambiguity, it is difficult to recover the constant component of the vector 
{Z" 1 } for point features, though typically one can recover all other components accurately. A 
similar result can be derived for lines. The rotational flow for a line is w x A . For typical 
© F0V « 1 5 \A Z \ « \A x y \ . Thus x and y rotations give flows roughly proportional to 
yxA — A x z and (x x A) — A z . From equation (4) above, the effects of a z-translation T z 

are suppressed by \A Z \ , and the translational flows due to the constant component of {B z } 
constant component from rotational effects, which makes this component difficult to recover 
accurately. 

One can get a more quantitative analysis of this effect as follows. Assuming that the input 
data consists of lines only, [O x O x O x ] = YL T S {3) U + ^¥ X % ¥ 2 ]q implies that 
P 5 H[o u? ® L> , ®^] = 0 where ? s is a projection matrix that annihilates S^K Those {b} 
components that produce small overlaps P^H®^ , will also produce small 
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overlaps ^ O^H^HO^ . It follows from standard perturbation theory that noise will 
mix these components into the true solution for the B, where the amount of contamination is 
inversely proportional to the size of the eigenvalue. Define B = [{B x }; {B y \ {B z }]. We 

computed for several sequences the eigenvalues of H 5 , defined such that 
5 B T U B B = ]T O^H 7 HO u , and found that, as expected, there was one small eigenvalue, 

whose eigenvector approximately equaled the constant component of {B z }. Thus one can use 
the input data to estimate the inaccuracy in recovering this component. 
General-Motion Algorithm for Intensities 
The General Motion Algorithm for just Intensities is as follows. 
10 0. First, assuming that the translations are zero, we recover the rotations and warp all 

images l\..J Nl ~ l toward the reference image 7°. In this case we then let the image 
displacements <T B refer to the unrotated images. 

-y T 

1 . We then compute H and A CH = C n AH , and using the singular value 
decomposition (SVD), compute the best rank-3 factorization of - A CH D CH » M^S^ T > 
15 where M (3 ^ and are rank 3 and correspond respectively to the motion and structure. 

2a. Further S (3) satisfies, <D = H r S (3) /7 + Vtl 



where U and Q are unknown 3x3 matrices. We eliminate the unknowns Z 1 from the & a 
NECI1071 19 g\nec\1196\I3701\spec\13701pjl 



13701.pje 

in this system of equations to get 3N p linear constraints on the 18 unknowns in U and Q . 

We solve these constraints with o{n ) computations using the SVD. Then given U and 

Q , we recover the structure unknowns Z 1 from (5). 

2b. Then given U , we use S®U « O and equation (4) to recover the translations. 

5 

3. The rotations W can then be recovered from C"^WP r = -C~% (T<£ r + a). 

Algorithm Analysis 
Step 0. 

10 Our neglect of the translations in the basic general-motion algorithm above produces 

small errors in recovering the rotations when the translations are moderate, with 

|r^ /e |/Z min < 1/4 as described in "Rigorous Bounds for Two-Frame Structure from Motion/ 5 

J. Oliensis IUW, 1225-1231, 1994. We describe two multi-frame techniques for recovering 

-y ~y T 

the rotations. Neglecting the translations in (4) gives - C n A « C /2 W*¥ , which can be 
15 solved for W. This technique resembles the projective algorithm of L. Zelnik-Manor and M 

Irani, described in "Multi-Frame Alignment of Planes", CVPR 1:151-156, 1999 for 

-y 

reconstructing planar scenes, except that our use of C n reduces the bias due to 
overemphasizing the noise in the reference image. As usual, one can extend the technique to 
larger rotations by iteratively warping and re-solving for W. 
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The technique just described is best for small rotations, since it uses the first-order 
rotational displacements. Another approach begins with the extension to planar scenes as 
described in "Structure from Motion from Points, Lines and Intensities, J. Oliensis and M. 
Wennan CVPR 2000 of the multi-frame Sturm-Triggs technique set forth in "A factorization 
5 based algorithm for multi-image projective structure and motion", P. Sturm and B. Triggs, 
ECCV 709-720, 1996. This multi-frame approach recovers homographies of arbitrary size 
between the reference image and the subsequent images. One can recover the rotations by 
choosing the orthogonal, positive-determinant matrices closest to the recovered 
homographies. 
10 Step 2a. 

From (4), the columns of S^ generate approximately the same subspace as that 
generated by the <$> x , ® y , <D Z : 

S ( % * HO) 

where U is a 3 x 3 matrix, (7) Above gives an overconstrained system of linear equations 
15 that one can solve directly for U and Z~ { . However this is an o{n 1 ) computation. For 
large images, we use instead the following o(n ) technique. 

First we define P = H r H , a projection matrix. Multiplying (7) by H r gives 
H r S { % » P<D , which implies 
(D«H r S (3) + x PQ 

20 We eliminate the Z~ l from (8) above and solve directly for U and Q . Denote the 
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columns of Uby U s \u x U 2 U 3 ] 9 and similarly for Q . Let U[ = s~ l U 3 and Q f 3 = s~ l Q^ , 
where the scale s equals the average distance of the image points from the image center. We 
include s to reduce the bias of the algorithm. From the definitions of G) x , <£> y , <D Z , (8) above 

implies + *Uh t S% 2 +TO 2 1 i ,«/ j<i [h t S%] +W 2 ] n 

5 - s~ l (VI n * p„ \h t S [3) U 2 + *¥Cl 2 ] n « [ff r S ( % 3 ' and a similar equation 

with(x «-» y) . Step 2 A of our algorithm solves this system of linear equations for Q and U 
in the least-squares sense and then recovers the Z" 1 from these solutions and (8) above. The 
computation is o(n ) . Note also that step 2a bases its computations on V/ . It we use the 

value of V/ computed from the measured reference image/ 0 ,then the estimates of U, Q, Z" 1 
10 will not be true multi-frame estimates. To get a better multi-frame estimate, one can first re- 
compute 1° and V/ based on all the image data and use the result to compute U 9 CI via (9) 
as follows. 

Let be the best rank 3 approximation to A Cff = C ^AH r . Up to unknown 
rotational flows, A^H gives the intensity shifts due to the translational displacements: 

15 C"^A« A^H + W 7 " 

We then solve (10) in the least-squares sense for W, which gives improved estimates 
A e for A and I° e for 1° . These can then be used in (9). However, this computation of I° e 
gives o(n~ x (d 2 , NJ 2 t 2 Z~ 2 ) errors. If the original noise in 1° is less than this, one should use 
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1° directly in step 2a. 

Step 2b. 

We haveC~^T » M {3) U T , where t/ f is also a 3 x 3 matrix. The 
algorithm recovers U T and T by solving the linear system - A CH & M^U T Q T H T , 
5 for U T , using the <D computed previously, and then plugging in the result to recover T . 

General Notes 

Experimentally, we find that the intensity pattern varies significantly from image to 
image in our sequences. We actually apply our approach to a modified version of the original 
10 sequence, obtained by: 

1) Filtering the sequence with a laplacian of gaussian to emphasize edges; 

2) Applying a nonlinear transformation (a sigma function) to further emphasize the 
high-frequency features. The new sequences give a continuous representation of the 
interest features, which can be used to compute derivatives. This procedure 

15 emphasizes the discrete, more easily trackable features, but does not eliminate the 

information from other image regions. One could extend this approach to deal with a 
variety of interest features, selecting among them to determine the derivatives that are 
likely to give the best flow. To ensure that the BCE is approximately valid, we also 
require that the spatial derivatives be consistent over the whole sequence, and that the 

20 image intensity be well approximated by a plane at the scale at which we are 
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working. In our current implementation, we apply the algorithm at a single scale. 
In another embodiment of the present invention, the algorithm described above can also 
handle: cameras with changing and unknown focal lengths, where the calibration is otherwise 
fixed and known and camera calibrations that change arbitrarily from image to image in an 
5 unknown fashion (this is know as the projective case). 

For varying, unknown focal lengths, one can modify the algorithm along the line of the 
method described in "A Multi-frame Structure from Motion Algorithm under Perspective 
Projection" IJCV 34:2/3, 163-192, 199 and Workshop on Visual Scenes 77-84, 1999. The 
modification is described as follows, in Step 2a, we define H to annihilate W F = {p n • Vl(p n )} in 
10 addition to ^P^^,^ . In recovering the Z" 1 one must replace Yin (8) by and 

Q becomes a 4 x 3 matrix. The modified Step 2a can recover the Z 1 except for the constant 
(Z n = l) component, which is intrinsically difficult to recover accurately when the focal lengths 
are unknown and varying. 

For the projective case, the algorithm can be modified along the lines of the method 
15 described in "Fast Algorithms for Projective Multi-Frame Structure from Motion", J. Oliensis 
and Y. Gene, ICCV 536-543, 1999. In this case, we define H to annihilate eight length- 
vectors, where the n-th entries of these vectors are given by the eight quantities 
( V/ «L P«*( v/ Jc> P^( v/ J'P«> for a 9 b 9 c,de{x,y}. Step 2a must be modified by replacing 
*F in (8) by a matrix consisting of these eight vectors. 

20 
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Projective Algorithm 

The algorithm described here for calibrated sequences generalizes in a straightforward way to 
deal with uncalibrated sequences. In this projective case, instead of a preliminary stage of 
rotation computation, one computes planar homographies between the reference image and 
5 each subsequent image. It is worth noting that one can easily and accurately compute these 
homographies by a multi-frame generalization of the projective-reconstruction method. 
We briefly describe how this works for the example of tracked points. Assume the scene is 



10 where M l is a 3x3 matrix (a homography), S m is the structure 3-vector giving the position 
of the m-th point on the plane (in some arbitrary coordinate system), and X m is the projective 
depth. The steps of the homography recovery are: 



planar. Then X m ^ m =M ! S 



m 



(11) 



1) 



Take X m - 1 initially. 



For the current estimate of the X m , collect the X m q' m into a single 3Nj x N p 



15 



matrix T . Use the SVD to decompose this into the product of two rank 3 factors: 



3) 




to Step 2. 



After convergence, contains the M 1 estimates. Our estimate of the homography taking 
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the reference image to image / is M°\M l . This procedure is guaranteed to converge to the 



local minimum of 



p(4)| 2 / 

j , where is the difference between T and its best rank- 3 

rf 



approximation. Also, it gives nearly the optimal estimates of the M l if the true motion is not 
too large and the scene is truly planar. 

5 

The uncalibrated version of the line-based algorithm is described as follows. Let K l be the 
standard 3x3 matrix, with zero lower-diagonal entries, giving the calibration parameters for 
the z-th image. For the uncalibrated case, equation (3) still holds if one replaces 

R -» (K r )~ T RK == M H and T^AT, where K and K' are the calibration matrices for the 
10 first and second image. M H is a general homography and is the inverse transpose of the 
analogous homography for points. The first order approximation of the new version of 



A' = R 



A-B- 



v T-B + 5 4 , 



is ^A 1 «(T-A)B+^xA,butwith T»>ZTandwith 



a>xA replaced by 8M H A , where M H s 1 + 8M H . 

We define H to annihilate the first-order homography flow due to 8M H , instead of the 
15 rotational flows as for the calibrated case. Since the first-order approximation of M~J is 

1 - 8M T H , one can easily define H to annihilate the first-order flows for both points and lines. 



Represent M H = 



FG 
J T 0 



, where F is 2x2 and G and / are 2 x 1 . The first-order point and 
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line displacements due to each of the 8 parameters in 5M H are given in the following table: 









-^2,2 


^2,2 




G 2 


j, 


J 2 


A X: 


A x 


Ay 


0 


0 


A z 


0 


0 


0 




0 


0 


A x 


Ay 


0 


A z 


0 


0 


A Z: 


0 


0 


0 


0 


0 


0 


A x 


Ay 


IX: 


-X 


0 


-y 


0 


- X 2 


-xy 


-1 


0 


q y . 


0 


-X 


0 


-y 


-xy 




0 


-1 



We define H to annihilate the 8 vectors associated with the columns of this table. After the 
5 indicated replacements, the uncalibrated algorithm is the same as the calibrated one. 

If one fixes the point coordinates in some reference image, the remaining freedom under a 
projective transform is Z~ l ax + by + c + dZ~ l where x and y are the image coordinates. 
This corresponds to scaling and adding an arbitrary plane to the structure. 
One can derive a similar relation for B. Let P be a 3D point on the line determined by B. 

10 Then B-P=-l implies B * (x> y$) - -Z l . A projective transform must preserve B'«P' = -1. 
From this, and the transform of Z" 1 given above, it follows that B' -B = (a,Z>,c) up to a 
scaling, with the same constants a, b, c as for the point transform. This relation holds with 
the same constants for any line B. 

15 The fact that B is recoverable only up to an overall shift is also clear from the uncalibrated 
version of the algorithm. When we use H to eliminate the first-order effects of small 
homographies, we also eliminate the translational image flows due to the three constant 
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components of the B. Thus, these components cannot be recovered. 
Implementation 

5 It will be apparent to those skilled in the art that the methods of the present invention disclosed 
herein may be embodied and performed completely by software contained in an appropriate 
storage medium for controlling a computer. 

Referring to Fig. 1, which illustrates in block-diagram form a computer hardware system 
10 incorporating the invention. As indicated therein, the system includes a video source 101 , 
whose output is digitized into a pixel map by a digitizer 1 02. The digitized video frames are then 
sent in electronic form via a system bus 103 to a storage device 104 for access by the main 
system memory during usage. During usage the operation of the system is controlled by a 
central-processing unit, (CPU) 105 which controls the access to the digitized pixel map and the 
15 invention. The computer hardware system will include those standard components well-known 
to those skilled in the art for accessing and displaying data and graphics, such as a monitor, 106 
and graphics board 107. 

The user interacts with the system by way of a keyboard 108 and or a mouse 109 or other 
20 position-sensing device such as a track ball, which can be used to select items on the screen or 
direct functions of the system. 
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The execution of the key tasks associated with the present invention is directed by instructions 
stored in the main memory of the system, which is controlled by the CPU. The CPU can access 
the main memory and perform the steps necessary to carry out the method of the present 
5 invention in accordance with instructions stored that govern CPU operation. Specifically, the 
CPU, in accordance with the input of a user will access the stored digitized video and in 
accordance with the instructions embodied in the present invention will analyze the selected 
video images in order to extract the 3D structure and camera motion information from the 
associated digitized pixel maps. 

10 

Referring now to Fig. 2 the method of the present invention will be described in relation to 
the block diagram. A reference image of the sequence is taken by a camera at a reference 
perspective and one or more successive images of the sequence are taken at one or more 
successive different perspectives by translating and/or rotating the camera in step 201 . The 

15 images are then digitized 202 for analysis of the 3D image content, i.e. points, lines and 
image intensities. Then image data shifts for each successive image with respect to the 
reference image are determined 203; the shifts being derived from the camera translation 
and/or rotation from the reference perspective to the successive different perspectives. 
A shift data matrix that incorporates the image data shifts for each image is then constructed 

20 204 and two rank-3 factor matrices from the shift data matrix using SVD 5 one rank-3 factor 
matrix corresponding the 3D structure and the other rank-3 factor matrix corresponding the 
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camera motion are calculated 205. Recovering the 3D structure from the 3D structure matrix 
using S VD by solving a linear equation 206. The camera motion can then be recovered using 
the recovered 3D structure 207. 

While there has been shown and described what is considered to be preferred 
embodiments of the invention, it will, of course, be understood that various modifications and 
changes in the form or detail could readily be made without departing from the spirit of the 
invention. It is therefore intended that the invention be not limited to the exact forms described 
and illustrated, but should be constructed to cover all modifications that may fall within the 
scope of the appended claims. 
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WHAT IS CLAIMED IS: 

1 . A method for recovering 3D scene structure and camera motion from image data 
obtained from a multi-image sequence, wherein a reference image of the sequence is taken by 
5 a camera at a reference perspective and one or more successive images of the sequence are 
taken at one or more successive different perspectives by translating and/or rotating the 
camera, the method comprising the steps of: 

(a) determining image data shifts for each successive image with respect to the 
reference image; the shifts being derived from the camera translation and/or rotation from the 

10 reference perspective to the successive different perspectives; 

(b) constructing a shift data matrix that incorporates the image data shifts for each 

image; 

(c) calculating two rank-3 factor matrices from the shift data matrix using SVD, one 
rank-3 factor matrix corresponding to the 3D structure and the other rank-3 factor matrix 

1 5 corresponding tothe camera motion; 

(d) recovering the 3D structure from the 3D structure matrix by solving a linear 
equation; and 

(e) recovering the camera motion from the camera motion matrix using the recovered 
3D structure. 

20 



2, The method of claim 1 , wherein the image data is one or more selected from the 
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3. The method of claim 1, wherein the step of determining image data shifts includes 
initially recovering and compensating for camera rotation. 



10 



15 



20 



25 



4. The method of claim 1 , wherein step (b) comprises: 

computing H and D CH , where H is a (N tot -3)x N tot matrix and D CH =C V2 D H T 
and N tot = 2N p + 2 Ni + Nx where TVp, Afc, and 7^T e q ual the number of points, lines and 
intensities, respectively, and C is a (N T -1) x (N x -1) matrix with C ir = 5.., + 1 where Q is a 
constant table of values for all images and£.., is an operator equal to 1 iff = /' and 0 if / * i r , 
and D = [S a> L A ^ A] , H T is the transpose of the H matrix, where H is a matrix defined 
such that H T is an identity matrix and annihilates where ¥J = fpj fi> L *F£ co^l] 

and similarly for the y and z components where, a> l and a> L are constant weights that the user 



sets, and where = 



{P^(xxA)[ 

Uv(£xa)}_ 



W = 



{p^-lyxA)}' 
.{v(y*A)}. 



vp - 



^•(zxA)}- 
.{P,-(zxA)}. 



where ^is 









>,%)}" 












|f(«)}. 




>»(«)}. 



the unit normal to the plane containing the line and the camera center at the reference image 
and where x, y, z are unit vectors in the x, y and z directions, and where 

where q=(x,y) is the image position of the 
tracked point in the reference image and 

the three point rotational flows r {l) (x,y),r (2) (x,y),r {3) (x,y) are defined by 

[ r <vv»]=[(:£ ;,,)),(•"').(/)] , „ 

L v } J and where 

^ = -{w • r (i) (p)} 5 *¥ ly s -{w ■ r (2) (p)} ? Y b = ~{w * r (3) (p)} 5 and where /> gives the image 
coordinates of the pixel positions, and 

where A is an N 2 x 2N L matrix where each row corresponds to a different image / and 
equals {p^ • SA* f {p z • SA* f where P^ and P L are unit 3-vectors projecting onto two 
directions A / x(fxA / ) and z x A ; which we refer to respectively as the upper and lower 
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directions, where SA { is the line flow SK\ = - A° , and z is the unit vector in the z 
direction and where S is a N i x 2N matrix where each row corresponds to a different image 

i and equals [{s^ f \p y f ] , where s l m = q' m - q° and denotes the image displacement for the 
m-th tracked point and where A is a N f x iV x matrix, where each row corresponds to an 
5 image i and equals \aI* f where AI is the change in image intensity with respect to the 

reference image and where V denotes the /-th intensity image and where V n - V (p n ) denotes 

the image intensity at the «-th pixel position in/' and where the notation { V) is used to 
denote a vector with elements given by the V a . 

10 

5. The method of claim 1, wherein step (c) comprises: 

computing the best rank-3 factorization of T) CH « M (3) S (3)r where M (3) , S {3) are rank 
3 matrices corresponding respectively to motion and structure, using SVD. 



15 



The method of claim 1, wherein step (d) comprises: 



20 



25 



eliminating structure unknowns Q z9 B z and Z~ l from the <5 a to get 3N tot linear constraints 

on the U and Q using the linear equation, [o x W y O z ] = H T S (3) U + $> x W y W z lfl 9 where U 

and Q are unknown 3x3 matrices, and O and ¥ represent total translational and rotational 
flow vectors respectively, and solving these constraints with 0(N tot ) computations using the 

SVD, where the total translational flow vectors are defined by <b x = 



and similarly 



for the y and z components, and 

















\a x b l }_ 








M Z B L }_ 



where A is the normal to a first plane that 



passes through the center of projection and the imaged line in the reference image, and the 
normal to a second plane, B is defined by requiring B • A = 0 and B • Q = - 1 for any point Q 
the 3D line L, and where B u ^B^P U and B L =B- ? L are the upper and lower components of 
B, and where 
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. {0} j 



{or 



o = 



ke;'}" 
ke;') 



and where 



^/,—{^l<E>^-{^X},0 /z -{z-^V/.p)}, and 

where Q is the 3d coordinate for a 3D tracked point corresponding to an image pixel 

in the reference image, and Z n is the depth from the camera to the 3D point imaged at the n- 

th pixel along the cameras optical axis; and 

10 recovering the structure unknowns Q z 9 B z and Z" 1 from 

[O x O x $J = H r 5% + [^¥ y ¥ z ]a ? given U miCl. 



7. The method of claim 1 , wherein step (e) comprises: 
using S^U * [o x O y O z ] and 



15 D CH « C^ 2 {T x }<D r H T + C^ 2 {t v )o^H t + C~ }/2 {T z }<DjH T to recover the translations, and 



recovering the rotations o) x ,cd'cq z from 



constant (N x -1) x (N^l) matrix with C u , = S u , + 1 and 7" 1 represents the translation. 



8 . A method for recovering 3D scene structure and camera motion from image data 
obtained from a multi-image sequence, wherein a reference image of the sequence is taken by 
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a camera at a reference perspective and one or more successive images of the sequence are 
taken at one or more successive different perspectives by translating and/or rotating the 
camera, the method comprising the steps of: 

(a) initially recovering and compensating for camera rotation after warping the 
5 reference image by the previously estimated translational displacements; 

(a) determining image data shifts for each successive image with respect to the 
reference image; the shifts being derived from the camera translation and/or rotation from the 
reference perspective to the successive different perspectives; 

(b) constructing a shift data matrix that incorporates the image data shifts for each 

10 image; 

(c) modifying the image shifts by multiplying the image shifts 8A\ , s l m , AI l n and 
redefining these quantities to include the denominator factors 1 - Q~% for points, 1 - Z~ l T z 
for intensities and l-B-T for lines wheres^ s q' m - q° m and denotes the image displacement 
for the m-th tracked point, where M[ is the change in image intensity and where V denotes 

15 the z-th intensity image and where V n =I% n ) denotes the image intensity at the «-th pixel 

position ml 1 and where 8A\ represents the line flow 

(d) calculating two rank-3 factor matrices from the modified shift data matrix using 
SVD, one rank-3 factor matrix; corresponding to the 3D structure and the other rank-3 factor 
matrix corresponding to the camera motion; 

20 (e) recovering the 3D structure from the 3D structure matrix by solving a linear 

NECI1071 35 g\nec\1196\13701\spec\13701pjl 



13701.pje 

equation; and 

(f) recovering the camera motion from the camera motion matrix using the recovered 
3D structure. 

5 9. The method of claim 8, wherein the image data is one or more selected from the 
group consisting of points, lines and intensities. 



10 



15 



20 



25 



10. The method of claim 8 5 wherein step (b) comprises: 

computing H and D CH , where H is a (N tot - 3)x N tot matrix and D CH = C" 1/2 D H T 
and N tot = 2N p + 2N£+ Nx where N p , Nl, and Nx equals the number of points, lines and 
intensities, respectively, and C is a (N { -1) x (Nj -1) matrix with C ir = S ir + 1 where C w is a 
constant table of values for all images and<5, r is an operator equal to 1 if/ = /' and 0 if i * /' 
and D = [S a> L A ^ A] , H T is the transpose of the H matrix, where H is a matrix defined 
such that H T is an identity matrix and annihilates ¥ x , % , Y z where = [yJ <b l Y£ m^l ] 
and similarly for the y and z components and where^ and co L are constant weights that the 



user sets, and where = 



(V(*xA)}' 
{P £ -(ixA)}_ 



XV = 

* Ly - 



{V(y*A)}" 

_{P t -(yxA)}_ 



XV = 
? T Lz — 



(V(z*A)}" 
{ Pi -(zxA)}_ 



where 



A x is the unit normal to the plane containing the line L and the camera center at the reference 
image, and where x, % z are unit vectors in the x, y and z directions, and where 

where q=(x,y) is the image position of the 

ATWJJ 

tracked point in the reference image and 

the three point rotational flows r {1) (x,y),r {2) (x,y),r {3) (x,y) are defined by 

[ 





>,">(?)}' 










X 










.{-?(*)}. 



and where 



¥ fc ^-{v/.r 0 ^)},^ ^-{w-r^k)},^ =-{ v/ - r(3) (p)} 5 and where;? gives the image 
coordinates of the pixel positions, and 
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where A is an Nj x 2N L matrix where each row corresponds to a different image / and 
equals \p v -S^f \p l -SA 1 } 7 where and P L are a unit 3-vector projecting onto two 
directions A ; x (f x A l )and zxA / which we refer to respectively as the upper and lower 
directions, where SA { is the line flow SA i l sAj-Aj, and z is the unit vector in the z 
5 direction and where S is a N i x 2N p matrix where each row corresponds to a different image 

/ and equals [{s^ f \s[ f ] , where s* m = q l m - q° m and denotes the image displacement for the 

m-th tracked point and where A is a x N x matrix, where each row corresponds to an 

image i and equals {AT f where AI is the change in image intensity with respect to the 

reference image and where denotes the i-th intensity image and where V n - V (p H ) denotes 

10 the image intensity at the /?-th pixel position in/' and where the notation { F}is used to 
denote a vector with elements given by the V a . 



1 1 . The method of claim 8, wherein step (d) comprises: 
1 5 computing the best rank-3 factorization of D CH are rank 

3 matrices corresponding respectively to motion and structure, using SVD. 



12. The method of claim 8, wherein step (e) comprises: 



20 



25 



eliminating structure unknowns g z , B z and Z" 1 from the O a to get 3N tot linear constraints 

on the U and Q using the linear equation, [<P X ® y <D z ] = H T S [3) U + fF x W y W 2 lfl 9 where U 

and Q are unknown 3x3 matrices, and O and *F represent total translational and rotational 
flow vectors respectively, and solving these constraints with 0(N tot ) computations using the 

<t>. 

SVD, where the total translational flow vectors are defined by O x s 



and similarly 



for the y and z components, and 



.{A x B L )r 



where A is the normal to a first plane that 



passes through the center of projection and the imaged line in the reference image, and the 
NECI 1071 37 g\nec\l 196M3 70 l\spec\ 1 370 lpj 1 
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normal to a second plane, B is defined by requiring B • A = 0 and B • Q = -1 for any point Q 
on the 3D line L, 

and where B v = B-P V and B L = B • ? L are the upper and lower components of B, and where 



O = 

^ X 



.{oi. 



o = - 



"{o}" 

M 



and where 



10 where Q is the 3d coordinate for a tracked 3D point in the reference image, and Z n is 

the depth from the camera to the 3D point imaged at the w-th pixel along the cameras optical 
axis; and 

recovering the structure unknowns Q 2 , and Z A from 
[o x O x O J = B T S {3) U + fp, ^¥ z ]q, given £7 and Q . 

13. The method of claim 8, wherein step (f) comprises: 
using S {3) U « [o x O y O z ] and 

D OT * C"^ {r x }<P r H T + C^ 2 {?; }<i^H T + C"^ {T z }o[H t to recover the translations, and 
recovering the rotations cd\ ? co l y , 0) l z from 

20 ^ + + = C^Di - f C% {{T x }W + {?; }o; + }0[ )T , wherein C is a 



constant (Nj-1) x (1^-1) matrix with C- r = + 1 and T represents the translation. 
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A METHOD FOR RECOVERING 3D SCENE STRUCTURE AND CAMERA 
MOTION FROM POINTS, LINES AND/OR DIRECTLY FROM THE IMAGE 

INTENSITIES 

5 ABSTRACT OF THE INVENTION 

An algorithm for recovering structure and motion from points, lines and/or image intensities. 
The algorithm combines feature based reconstruction and direct methods. The present 
invention is directed to a method for recovering 3D scene structure and camera motion from 
image data obtained from a multi-image sequence, wherein a reference image of the sequence 

10 is taken by a camera at a reference perspective and one or more successive images of the 
sequence are taken at one or more successive different perspectives by translating and/or 
rotating the camera. The method comprising the steps of (a) determining image data shifts 
for each successive image with respect to the reference image; the shifts being derived from 
the camera translation and/or rotation from the reference perspective to the successive 

15 different perspectives; (b) constructing a shift data matrix that incorporates the image data 
shifts for each image; (c) calculating two rank-3 factor matrices from the shift data matrix 
using SVD, one rank-3 factor matrix corresponding the 3D structure and the other rank-3 
factor matrix corresponding the camera motion; (d) recovering the 3D structure from the 3D 
structure matrix using S VD by solving a linear equation; and (e) recovering the camera 

20 motion from the camera motion matrix using the recovered 3D structure. 
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